The work presented in this article studies how the context information can be used in the automatic sound event\r\ndetection process, and how the detection system can benefit from such information. Humans are using context\r\ninformation to make more accurate predictions about the sound events and ruling out unlikely events given the\r\ncontext. We propose a similar utilization of context information in the automatic sound event detection process. The\r\nproposed approach is composed of two stages: automatic context recognition stage and sound event detection\r\nstage. Contexts are modeled using Gaussian mixture models and sound events are modeled using three-state\r\nleft-to-right hidden Markov models. In the first stage, audio context of the tested signal is recognized. Based on the\r\nrecognized context, a context-specific set of sound event classes is selected for the sound event detection stage. The\r\nevent detection stage also uses context-dependent acoustic models and count-based event priors. Two alternative\r\nevent detection approaches are studied. In the first one, a monophonic event sequence is outputted by detecting the\r\nmost prominent sound event at each time instance using Viterbi decoding. The second approach introduces a new\r\nmethod for producing polyphonic event sequence by detecting multiple overlapping sound events using multiple\r\nrestricted Viterbi passes. A new metric is introduced to evaluate the sound event detection performance with various\r\nlevel of polyphony. This combines the detection accuracy and coarse time-resolution error into one metric, making\r\nthe comparison of the performance of detection algorithms simpler. The two-step approach was found to improve\r\nthe results substantially compared to the context-independent baseline system. In the block-level, the detection\r\naccuracy can be almost doubled by using the proposed context-dependent event detection.
Loading....